virology 166, 416-422(1988) 


Sequence of Mouse Hepatitis Virus A59 mRNA 2: Indications for RNA Recombination 

between Coronaviruses and Influenza C Virus 
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The nucleotide sequence of the unique region of coronavirus MHV-A59 mRNA 2 has been determined. Two open 
reading frames (ORF) are predicted: ORF1 potentially encodes a protein of 261 amino acids; its amino acid sequence 
contains elements which indicate nucleotide binding properties. ORF2 predicts a 413 amino acids protein; it lacks a 
translation initiation codon and is therefore probably a pseudogene. The amino acid sequence of ORF2 shares 30% 
homology with the HA1 hemagglutinin sequence of influenza C virus. A short stretch of nucleotides immediately up¬ 
stream of ORF2 shares 83% homology with the MHC class I nucleotide sequences. We discuss the possibility that both 
similarities are the result of recombinations and present a model for the acquisition and the subsequent inactivation of 
ORF2; the model applies also to MHV-A59-related coronaviruses in which we expect ORF2 to be still functional. 
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INTRODUCTION 

Murine hepatitis virus (MHV) is the most widely stud¬ 
ied member of the Coronaviridae. This family of envel¬ 
oped, single-stranded RNA viruses causes consider¬ 
able economic loss, since coronavirus infections can 
severely affect cattle, poultry, and pets. Human coro¬ 
navirus OC43 causes the common cold in man. Murine 
coronaviruses are of particular interest because sev¬ 
eral strains can cause a (chronic) demyelinating dis¬ 
ease in rats and mice. For this reason the pathogenesis 
of MHV infections is studied as an animal model for 
virus-induced demyelination (Wege ef a/., 1982). MHV- 
A59 virions contain an infectious RNA genome, about 
30 kb in length, associated with a nucleocapsid protein 
(N). Two membrane proteins have been identified; the 
transmembrane glycoprotein El and the large surface 
glycoprotein E2 (Siddell etal., 1982). The MHV-A59 ge¬ 
nome is composed of seven different regions (A to G), 
separated by short, very similar junction sequences 
(Bredenbeek ef a/., 1987). The messenger RNAs that 
are synthesized during infection are 3'-coterminal, and 
each extends to a different junction sequence in the 5'- 
direction. This results in a nested set of mRNAs, includ¬ 
ing the genome, in which each has a different "unique” 
region at its 5'-end (Leibowitz ef a/., 1981; Lai ef a/., 
1981; Spaan etal., 1982). All mRNAs share a leader 
sequence of about 72 nucleotides (Spaan etal., 1983; 
Lai etal., 1984). In vitro translated MHV mRNAs encod¬ 
ing the structural proteins N, El, and E2 and the 14.5K 
nonstructural protein are functionally monocistronic 
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(Rottierefa/., 1981; Siddell, 1983), and sequence anal¬ 
yses have shown that the coding regions are located 
at the 5'-end of these individual mRNAs (Siddell, 1987). 
There is one possible exception: sequence analysis of 
the 5'-end of mRNA 5 (region E) revealed two open 
reading frames (Skinner ef a/., 1985; Budzilowicz and 
Weiss, 1987). Whether both reading frames are used 
is not known. 

The coronaviruses studied to date show an identical 
order of the genes encoding the structural proteins: 5'- 
E2-E1-N-3'(De Grootefa/., 1987). Between coronavi¬ 
ruses these genes are highly homologous. In contrast, 
differences are found in the structure and number of 
the genes encoding the nonstructural proteins, which 
is reflected in the number of subgenomic mRNAs that 
is synthesized by each coronavirus. In infectious bron¬ 
chitis virus (IBV), feline infectious peritonitis virus 
(FIPV), and its close relative transmissible gastroenteri¬ 
tis virus (TGEV), members of different antigenic clus¬ 
ters from MHV, the largest subgenomic mRNA en¬ 
codes the peplomer protein E2 orS (Binns etal., 1985; 
Niestersera/., 1986; De Groot era/., 1987; Rasschaert 
and Laude, 1987; Jacobs ef a/., 1987). In contrast, in 
MHV-infected cells an additional, larger RNA (mRNA 
2) has been identified (Spaan ef a/., 1981; Weiss and 
Leibowitz, 1983). In vitro translation of this mRNA 
yields a 30K-35K protein (Leibowitz era/., 1982; Sid¬ 
dell, 1983). In MHV-JHM-infected cells, small amounts 
of a 30K protein can be detected (Siddell era/., 1981). 
However, the size of the unique region of mRNA 2, ap¬ 
proximately 2 kb, indicates a larger coding capacity. 

In order to study the function of mRNA 2 we have 
cloned and sequenced region B of MHV-A59. Here we 
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present its primary structure and show that it contains 
two open reading frames (ORF). The predicted amino 
acid sequence of the second ORF is remarkably similar 
to the HA1 sequence of the hemagglutinin protein of 
influenza C virus. We discuss the possibility that this 
ORF has been acquired by a recombination event. 

MATERIALS AND METHODS 
cDNA synthesis and cloning 

A MHV-A59-specific cDNA library was created using 
random primers on purified genomic RNA. Procedures 
were identical to those described previously (Luytjes et 
al., 1987). Full details will be presented elsewhere 
(P. J. Bredenbeek etal., manuscript in preparation). 

Selection and analysis of cDNA clones 

Recombinant cDNA clones were selected by hybrid¬ 
ization (Meinkoth and Wahl, 1984) to oligonucleotide 
probes specific for the viral mRNAs (P. J. Bredenbeek 
et al., manuscript in preparation). Plasmid DNA from 
recombinant clones was prepared according to Birn- 
boim and Doly (1979). Inserts were subcloned into 
Ml3 vectors (Messing, 1983). Selection of Ml3 sub¬ 
clones specific for the unique region of mRNA 2 was 
performed by hybridizing phage supernatant to pen- 
tamer primed probes (Feinburg and Vogelstein, 1983; 
Roberts and Wilson, 1985) from previously oligo¬ 
nucleotide-selected cDNA clones. 

DNA sequence analysis 

Sequence analysis was essentially done according 
to Sanger et al. (1977). Computer assembly of se¬ 
quence data was performed using the Staden program 
set (1986). 

Similarity search of protein sequences 

The predicted amino acid sequences were com¬ 
pared to the National Biomedical Research Foundation 
(NBRF) Protein Library (release 11) using the FASTP 
program set created by Lipman and Pearson (1985). 
Additional analysis of similarities was carried out with 
the DIAGON program of Staden (1982). 

RESULTS 

Isolation of region B specific cDNA clones 

We have recently constructed an almost complete 
random-primed cDNA library of the MHV-A59 genome. 
A set of oligonucleotides was synthesized, based upon 
the sequence of previously obtained MHV-A59-spe- 
cific cDNA clones which had been mapped on the viral 
mRNAs (P. J. Bredenbeek, manuscript in preparation). 


Oligonucleotides OL 4 (specific for mRNA 1), OL 6 
(mRNA 2), and OL 7 (mRNA 3, see Luytjes et al., 1987) 
were used to screen the cDNA library for clones cover¬ 
ing region B. Two completely overlapping clones (30, 
96) and several clones with partial overlaps (4D, 35, 
F71, 95, 918) were isolated. Clone 96 was digested 
with Sau3A and subsequently ligated into the SamHI 
site of M13mp9. The other selected cDNA clones were 
subcloned using restriction enzymes as indicated in 
Fig. 1. Each nucleotide of region B was determined on 
at least two different cDNA clones and selected re¬ 
gions on three or more cDNA clones. 

Identification of the unique region of mRNA 2 

The 3'-end of region B has already been identified 
at the junction sequence 5'-UAAUCUAAAC-3', which 
separates it from the peplomer coding sequence 
(Luytjes et al., 1987). The only other potential junction 
sequence within the consensus sequence of the re¬ 
gion B-specific cDNA clones was found at position 
-9589 (Fig. 1) from the start of the poly(A)-tail of the 
genome: 5'-AAAUCUAUAC-3'(Fig. 2). Immediately up¬ 
stream of this sequence an ORF terminates, the pri¬ 
mary structure of which shows a high similarity to the 
3'-terminal sequence of the unique region of IBV mRNA 
F (Boursnell et al., 1987, and data not shown). This 
strongly suggests that the junction sequence at posi¬ 
tion -9589 corresponds to the 5'-end of the unique re¬ 
gion of mRNA 2. 

Nucleotide and amino acid sequence 

The consensus nucleotide sequence of region B is 
2176 residues long (Fig. 2). It contains two open read¬ 
ing frames. The first open reading frame (ORF1) starts 
18 nucleotides downstream from the junction se¬ 
quence and is 261 amino acids (aa) long. The second 
ORF (ORF2) starts 903 nucleotides downstream and is 
413 aa long. It terminates 23 nucleotides upstream 
from the junction sequence that separates regions B 
and C (the peplomer gene). Between ORF1 and ORF2 
lies a stretch of 92 nucleotides with several termination 
codons in each reading frame (see Fig. 2). 

Analysis of ORF1 

In ORF1 three potential translation initiation codons 
can be found. The first AUG is in a strong context (Ko¬ 
zak, 1986) and is therefore most probably used. The 
coding capacity of ORF 1 is 30K, which is in agreement 
with the products obtained after in vitro translation of 
mRNA 2. There are no membrane protein sequence 
characteristics, such as a signal sequence, a trans¬ 
membrane anchor sequence, or potential N-glycosyla- 
tion sites. Diagon comparison (Staden, 1982) of the 
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Fig. 1 . Cloning and sequencing strategy of the MHV-A59 region B. The upper line represents the MFIV genome. Symbols indicate the restric¬ 
tion enzyme recognition sites (specified in the figure) used in subcloning. Vertical bars and the negative numbers above mark the starts of the 
junction sequences and the distances to the start of the poly(A)-tail of the genome. The arrow points to the position of oligonucleotide 6 (OL 6). 
Open boxes represent open reading frames. Pol, polymerase; E2, peplomer protein; ORF1 and ORF2 are region B open reading frames. Num¬ 
bered bars refer to cDNA clones; direction and extent of sequencing of subclones is indicated by the arrows below. 


ORF1 amino acid sequence with available sequences 
of other coronaviruses did not reveal any similarities. A 
FASTP similarity search (Lipman and Pearson, 1985) 
of the NBRF protein library produced an alignment to 
several proteins with nucleotide binding properties 
(data not shown). Recently, consensus sequence ele¬ 
ments have been published, for which an involvement 
in nucleotide binding is proposed (Dever et at., 1987; 
Fry et at., 1986). Three regions in the ORF1 sequence 
match to these elements (Fig. 3). 

Analysis of ORF2 

ORF2 does not start with an AUG codon; the first 
potential initiation codon within ORF2 is found at posi¬ 
tion 110. Interestingly, in the region upstream of ORF2 
an AUG codon (position 879) is found in a favorable 
context, which precedes a short reading frame, sepa¬ 
rated from ORF2 by only one opal termination codon 
(Fig. 2). This short reading frame is 90% homologous 
(83% at the nucleotide level) to the N-terminus of the 
signal sequence of several MHC class I genes (Fig. 4; 
Schepart et at., 1986). There is no other significant sim¬ 
ilarity between class I sequences and any MHV se¬ 
quence. The region overlapping the end of ORF1 and 
the beginning of ORF2 has been sequenced on three 
independent cDNA clones. The sequences are identi¬ 
cal, excluding the possibility that the presence of the 
termination codon is a cloning or sequencing artifact. 

The sequence of ORF2 shows characteristics of a 
membrane protein sequence: the C-terminal hydro¬ 


phobic residues (underlined in Fig. 2) could provide a 
membrane anchor and 10 potential N-glycosylation 
sites are present. 

The most remarkable aspect of the ORF2 sequence 
came from FASTP analysis of the NBRF protein library: 
the predicted amino acid sequence encoded by ORF2 
shows a 30% homology with the HA1 sequence of the 
hemagglutinin protein of influenza C virus (Nakada et 
at., 1984; Pfeiffer and Compans, 1984). The alignment 
presented in Fig. 5 shows that several regions are com¬ 
pletely identical and that many conservative substitu¬ 
tions (Dayhoff et at., 1983) are present. 

We could not detect similarities between the pre¬ 
dicted ORF2 amino acid sequence and other influenza 
C (or A or B) virus sequences, nor was there any similar¬ 
ity to available coronavirus sequences. 

DISCUSSION 

In this paper we present the primary structure of the 
unique region of MPIV-A59 mRNA 2. Sequence analy¬ 
sis revealed two ORFs. ORF1 has a coding capacity of 
30K. In vitro translation of mRNA 2 of MPIV-JHM (Sid- 
dell, 1983) and MPIV-A59 (Leibowitz et at., 1982) 
yielded a 30K protein. Also in MHV-JFIM-infected cells 
small amounts of a 30K protein have been detected 
(Siddell etai, 1981). This suggests that this protein is 
encoded by ORF1 from mRNA 2. We assume that the 
ORF1 translation product is initiated at the 5'-proximal 
AUG since this codon is in a preferred context (Kozak, 
1986). The presence of three consensus elements in 



418 


LUYTJES ET AL. 


MAF ADKPNH F I NFPLAQFSGFMGKYLKLQSQ 31 

lAAATCTATAd nGTCGTGGCTGTGAAAATGGCCrTTGCTGACAAGCCTAATCATTTCATAAACTTTCCCCTGGCCCAATn'AGTGGCTrTATGGGTAAGTATn'AAAGCTACACTC'rCAA 120 

♦OR FI 60 

LVEMGLDCKLQKAPHVSrTLLDIKADQYKQVEFAIQEIID 71 

CrTGTGGAAATGGGTTTAGACTGTAAATTACAGAAGGCACCACATGTrACTATrACCCTGCTTGATATTAAAGCAGACCAATACAAACAGGTGGAATTTGCAATACAAGAAATAATAGAT 240 

180 

DLAAYEGDIVFDNPHMLGRCLVLDVRGFEELHEDIVEILR 111 

GATCTGGCGGCATATGAGGGAGATATTGTCTTTGACAACCCrCACATGCTTGGCAGATGCCrTGTTCTTGATGTTAGAGGATTTGAAGAGTrGCATGAAGATATTGTTGAAATTCTCCGC 360 

300 

RRGCTADQSRHWIPHCTVAQFDEERETKGMQFYHKEPFYL 151 

AGAAGGGGTTGCACGGCAGATCAATCCAGACACTGGATrCCGCACTGCACTGTGGCCCAATrTGACGAAGAAAGAGAAACAAAAGGAATGCAATTCTATCATAAAGAACCCTTCTACCTC 480 

420 

KHNNLLTDAGLELVKIGSSKIDGFYCSELSVWCGERLCYK 191 

AAGCATAACAACCTATTAACGGATGCTGGGCTTGAGCTCGTGAAGATAGGTTCTTCCAAAATAGATGGGTTTTATrGTAGTGAACrGAGTGnTGGTGTGGTGAGAGGCnTGTTATAAG 600 

540 

PPTPKFSDIFGYCCIDKIRGDLEIGDLPQDDEEAWAELSY 231 

CCTCCAACACCCAAATTCAGTGATATATTTGGCTATTGCTGCATAGATAAAATACGTGGTGA'nTAGAAATAGGAGACCTACCGCAGGATGATGAGGAAGCGTGGGCCGAGCTAAGTTAC 720 

660 

♦NEGLYVLICFYTISVI 

*RVVCVDLFLHY*CNK 

HYQRNTYFFRHVHDNSIYFRTVCRMKGCMC*FVFTLLV** 261 

CACTATCAAAGAAACACCTAdTCTTCAGACATGTGCACGATAATAGCATCTATnTCGTACCGTGTGTAGAATGAAGGGTrGTATGTGTTGATTTGTTTTTACACTATrAGTGTAATAA 840 

780 

SLLFC*KGQDVHSYGSSHTAFADLMSAGVWVQ* 
LIILLKRAGCA*LWLLAHCFC*FDVSWC‘LGSMNLLTSFHI 
A Y Y F VEKGRMCIAMAPRTLLLLI*CQLVFGFNEPLNIVSH 16 

GCTrATTATnTGTTGAAAAGGGCAGGATGTGCATAGCTATGGCTCCTCGCACACTGCmTGCTGA'nTGATGTCAGCTGGTGTTTGGG'ITCAATGAACCTCTrAACATCGTTTCACAT 960 

900 ♦0RF2 


LNDDWFLFGDSRSDCTYVENNGHPKLDWLDLDPKLCNSGK 56 
TTAAATGATGACTGGTTTCTATTTGGTGACAGTCGTTCTGACTGTACCTATGTAGAAAATAACGGTCATCCTAAATTAGATrGGCTTGACCTCGACCCAAAGTTGTGTAATTCAGGAAAG 1080 

1020 

ISAKSGNSLFRSFHFTDFYNYTGEGDQIVFYEGVNFSPSH 96 
ATTrCCGCAAAGAGTGGTAACTCTCTCTTTAGGAGTnTCACTTCACTGATTTTTACAATTATACGGGTGAGGGAGACCAAATTGTA'nTTATGAAGGAGTrAATTTrAGTCCCAGCCAT 1200 

1140 

G F KCLAHGDNKRWMGNKARFYARVYEKMAQYRSLSFVNVS 136 
GGCrrrAAATGCCTGGCTCATGGAGATAATAAAAGATGGATGGGCAATAAAGCTCGATTTTATGCCCGAGTGTATGAGAAGATGGCCCAATATAGGAGCCTATCGTTTGTTAATGTGTCT 1320 

1260 

YAYGGNAKPASICKD NTLTL.NNPTFISKESNYVDYYYESE 176 
TATGCCTATGGAGGTAATGCAAAGCCCGCCTCCATrTGCAAAGACAATACTTTAACACTCAATAACCCCACCTTCATATCGAAGGAGTCTAATTATGTTGATTATTACTATGAGAGTGAG 1440 

1380 

ANFTLEGCDEFIVPLCGFNGHSKGSSSDAANKYYTDSQSY 216 
GCTAATTTCACACTAGAAGGTTGTGATGAATn'ATAGTACCGCTCTGTGGTTTTAATGGCCATTCCAAGGGCAGCTCTTCGGATGCTGCCAATAAATATTATACTGACTCTCAGAGTTAC 1560 

1500 

YNMDIGVLYGFNSTLDVGNTAKDPGLDLTCRYIALTPGNY 256 
TATAATATGGATATTGGTGTCrTATATGGGTTCAATTCGACCTTGGATGTTGGCAACACrGCTAAGGATCCGGGTCTrGATCTCACTrGCAGGTATCTTGCATrGACrCCTGGTAATTAT 1680 

1620 

KAVSLEYLLSLPSKA1CLHKTKRFMPVQVVDSRWSSIRQS 296 
AAGGCrGTGTCCTTAGAATATTTGTTAAGCTTACCCTCAAAGGCTA'nTGCCTCCATAAGACAAAGCGCTTTATGCCTGTGCAGGTAGTTGACTCAAGGTGGAGTAGCATCCGCCAGTCA 1800 

1740 

DNMTAAACQLPYCFFRNTSANYSGGTHDAHHGDFHFRQLL 336 
GACAATATGACCGCTGCAGCCTGTCAGCTGCCATATTGTTrC'ITrCGCAACACATCTGCGAATTATAGTGGTGGCACACATGATGCGCACCATGGTGATTTTCATTTCAGGCAGTTATTG 1920 

1860 

SGLLYRVSCIAQQGAFI, YNNVSSSWPAYGYGHCPTAAN1G 376 
TCTGGTTTGTTATATAATGTTTCCTGTATTGCCCAGCAGGGTGCArrrcri'TATAATAATGTTAGTTCCTCTTGGCCAGCCTATGGGTACGGTCATTGTCCAACGGCAGCTAACATTGGT 2040 

1980 

YMAPVCIYDPLP VII, l.GVLLGIA VLIIVFLRVLFY D G * 413 

TATATGGCACCrGTrrGTATCTATGACCCTCTCCCG(7rCATACrGCTAGGTGTGTrA'rTGGGTATAGCTGTGrrGATTATTGTGTrnTGAATGTTrTATnTATGACGGATAGCGGTGT 2160 

2100 


TAGATTGCATGAGGC ^TAATCTAAAd 2186 

Fig. 2. Nucleotide sequence of the MHV-A59 region B and predicted amino acid sequence of the open reading frames ORF1 and ORF2. 
Junction sequences (see text) are boxed. The start of the open reading frames is indicated by the arrows below the sequence. The region 
between ORF1 and ORF2 is translated in three reading frames. The hydrophobic C-terminus of ORF2 is underlined. ORF1 is numbered 1 (M)- 
261 (C), ORF2 is numbered 1 (C)- 413 (G). Nucleotide numbering starts at relative position -9589 from the start of the poly(A)-tail. Single letter 
amino acid code is used. 
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Fig. 3. Alignment of ORFl of MHV-A59 region B to sequence elements which are proposed to be involved in nucleotide binding. ATP, 
sequence elements involved in ATP binding. GTP, sequence elements involved in GTP binding. ORFl, first open reading frame of region B of 
MHV-A59. The numbers represent the distances between the elements (the first number is the distance to the start of the sequences). X, any 
amino acid; B, hydrophobic amino acids L, V, F, Y, I. Data are taken from Fry era/. (1986) and Dever eta/. (1987). 


the sequence of ORFl with possible nucleotide bind¬ 
ing and phosphorylating properties (Devereta/., 1987; 
Fry et al., 1986) suggests a role for its product in virus 
replication or phosphorylation of the nucleocapsid pro¬ 
tein (Siddell et al., 1982). Experiments are in progress 
to establish whether the ORFl product is essential for 
MHV, in view of the fact that a mRNA 2 is absent in 
cells infected with coronaviruses from other antigenic 
clusters. 

Unexpected was the presence of a second open 
reading frame, ORF2, located between ORFl and the 
peplomer gene, without a translation initiation codon, 
showing a remarkable amino acid similarity to the HA1 
sequence of influenza C virus. The percentage of iden¬ 
tity is high enough to rule out convergent evolution 
(Dayhoff et al., 1983; Doolittle, 1981). We believe that 
this similarity is the result of a recombination between 
coronaviruses and influenza C virus. Recent studies 
have indicated that coronaviruses are indeed capable 
of recombination. Makino et al. (1986) described ho¬ 
mologous recombination between coronaviruses in 
mixed infections; the stretch of 267 nucleotides that 
we have found in the MFIV-A59 peplomer gene and 
that is absent in MFIV-JFIM (Luytjes et al., 1987) could 
indicate a nonhomologous recombination. 

In MFIV-A59-infected cells a protein that can be 
assigned to ORF2 has never been detected (Siddell 
et al., 1982). Since nonfunctional reading frames of 
RNA viruses show a high rate of mutation (Holland et 
al., 1982), ORF2 must be either functional or the result 
of recent genetic changes. In the first case, possible 
ways of translating ORF2 would be either internal initia¬ 
tion at AUG codons in suboptimal contexts (which is 
unlikely) or protein initiation at an upstream AUG codon 


at position -33 from the start of ORF2 and read- 
through of the opal termination codon at position -3. 
Opal suppression has been reported for RNA viruses 
(Strauss et al., 1983; Morch etal., 1987) and can be an 
important feature of the viral translation strategy. Inter¬ 
nal initiation combined with read-through of an opal ter¬ 
mination codon would probably lead to undetectable 
amounts of protein in infected cells. The number and 
location of termination codons in the region between 
ORFl and ORF2 excludes the possibility of frame 
shifting. 

In the second case ORF2 could have been acquired 
recently by recombination between MHV and influenza 
C virus. However, there is considerable evolutionary 
distance between both viruses: the nucleotide se¬ 
quences of ORF2 and the HA1 gene are not similar and 
the codon usage in both reading frames is different 
(data not shown). Therefore, recombination must have 
taken place between ancestors of these viruses. This 
means that closely related coronaviruses should exist 
in which ORF2 is still expressed and that ORF2 in MHV- 
A59 must have been recently inactivated by genetic 
changes. An ORF2 product would range in size from 
45K (unglycosylated) to 65K (N-glycosylated) and sev¬ 
eral coronaviruses containing additional proteins in this 
range have been reported. MHV-JHM, which shares at 
least 87% homology with MHV-A59 in the nucleotide 
sequences from the peplomer gene down to the 
poly(A)-tail (Luytjes et al., 1987), encodes one addi¬ 
tional glycoprotein: gp 65 (Siddell, 1982). Sequence 
data indicate that the corresponding gene must be lo¬ 
cated upstream of the peplomer protein gene. Taguchi 
et al. (1985, 1986) described a JHM variant (CNS) 
which shows a high expression level of a 65K protein 
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Fig. 4. Alignment of the MHV-A59 sequence around the start of ORF2 from region B and a MHC class I mRNA: H2-D D . The MHV-A59 sequence 
is numbered according to Fig. 2. The H2 sequence is taken from Schepart etal. (1986). The H2-D p amino acid sequence depicted represents 
the signal sequence. Identical nucleotides are marked with lines and identical amino acids are boxed. 




420 


LUYTJES ET AL. 


ha 1 -► 

MFFSI,L!,MEGLTEAEKIKICLQYQVNSSFS1.HNGFGGNLYATE 43 


HA1 ekrm|f|elvk| 

ORF2 CQLV0GFNE|, 


LNIV 


MLM 


k aca|s|v[ln|qstBt dFGDS rIt|d| 


f-QS 


Ip—pIwifl Jfgds ri.sipI c t y v eInih G tIFII 


MSAFiPIR-SEMSA 


|Kl.DWLDLDFt 


l 0T~ AR&RSLjSGlGlSlJMlJ 


Ildl.CNSdKllSAl dSGimsiJ 


smfgppgkvd 


FRia 


hw 


fhftd—F uji 


CSKHKVF 

adpoilvF 


YEGVNlW|SP)H A A T ifl 


yegvnifispI 


hr— 


SHGFrn 


nRtdtI 


mlmwqkst 


I.AIIGDNlaR|w(MGN| 


AR| 


YHLASRpHCl-pM 


YARVl 


YFt 


AI.DKTIPLQVTKGVA 


I1NC 


khaiqIyrsibfivnIvsyayggnakpasicI 


:MN S FttJK 


M-dMtliIl 


wPjALYTQEVKPLEQICGE 


Mnhtfisitiskyvdyyy— 


IlHIlA 


FtFTI 


s eJ.aJniftu 


PTQFGTYElqKLHim 


eg -a 


DEFT 


I YOpK 


NGH 


EVYNK'RGCGN 


■rtF- 


Sia 


-Q|vi 

|GSSSDAANKYlYtrDSo|sY 


hlDSSGE 


M v G 


0LDNRVSPYTGNSGDTPTMQ0Drtn 


|Y|NMntG|vjl.YyFNSTLDVGNTAKDPGLDLT|dRYtljA| 


[iJkJp dSJVIs 
LrilPGINlYl K 


SPRRPLMPteRSYn 


AMLEYUl 


'SlJgSKAini.H 


FDMfKlElK] 


tel 


pv 


IvRjB 


[w)g k g|r|k|s d|y a v dq(ac|l s t[ 


R F MlP VIQ VIVlDIslRlwiS SI IRIOIS PI N MT A Al 


(M3-QI. 


m 


MI, IQKQKP 


lYIT 


HsIoteE 

IlAlohlG 


WVNETSPFTEEYLLPPKFl(llcP|l,[AAl 


MyMffrntsan 


HA2- 


EA Df 




TTHGDbEM|R|SLLSGIJ: 


iGiGT hidIa Jhh G Pi F h F 1RICII.L S G Lj 


Stearp 


NVSC 


KEESIPKIM 


AFEYNNVSSSWPAYG— Y TGIIHC Ft TU Am IG Y - - M All 


|nGu,r 

vciyh 


rSGTDTTVTK PKSRI Ft I 


F1LPVII.I.GV- 


-I.I. 


|DD)LI ipi. L fvaiveagiggyllgsrke 
i.iiiv'fiiInvIlfydg* 


ream 


125 

86 

211 

173 

299 

257 

386 

345 

476 

413 


Fig. 5. Alignment of the MHV-A59 0RF2 sequence from region B and the influenza C hemagglutinin HA1 sequence and part of the HA2 
sequence. Identical residues are boxed, substitutions scoring 0 or positive according to Dayhoff et at. (1983) are indicated by colons. Dashes 
represent gaps which were inserted to maximize similarity. The sequence was taken from Nakada etal. (1986). 


and an additional mRNA 2a, intermediate in size be¬ 
tween mRNA 2 and mRNA 3. Bovine coronavirus (BCV) 
shows a strong similarity to MHV-A59 in the nucleo- 
capsid and matrix protein sequences (Lapps et at., 
1987) and it contains an additional spike protein E3, a 
hemagglutinin (King etal., 1985; Deregtefa/., 1987). 
The size of the hemagglutinin monomer is 65K and 
BCV also encodes a mRNA 2a (Keck et at., 1987). The 
data on these coronaviruses lead us to suggest that 
ORF2 in MHV-A59 corresponds to the reading frames 
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Fig. 6. Schematic representation of the recombination and muta¬ 
tion events that could have led to the situation in MHV-A59 region B. 
Drawn lines indicate recombination between the regions marked by 
dark horizontal bars. POL, polymerase; ORF1 and ORF2 are the read¬ 
ing frames of MHV-A59 region B; E2, peplomer protein gene; HA, 
hemagglutinin; E3, putative membrane protein (gp 65 of MHV-JHM, 
hemagglutinin of BCV and OC43); MHC, class I MHC mRNA. 


encoding gp 65 in JHM and the 65K hemagglutinin E3 
in BCV and that these genes are located on a separate 
mRNA 2a in the JHM CNS variant and in BCV. Junction 
sequences are involved in the initiation of coronavirus 
mRNAs. The apparent absence of a junction sequence 
upstream of ORF2 in MHV-A59 explains the absence 
of a mRNA 2a in infected cells (Spaan et at., 1981; 
Weiss and Leibowitz, 1983). This could have been the 
result of an accumulation of recent point mutations. 
However, the strong similarity at both the amino acid 
and the nucleotide levels between the region immedi¬ 
ately upstream of the opal termination codon (in front 
of ORF2) and the 5'-end of the coding region of several 
MHC class I mRNAs indicates that the initiation codon 
of ORF2 and the junction sequence upstream were lost 
because of a recent nonhomologous recombination 
event with MHC mRNA. 

The suggested homology between ORF2 of MHV- 
A59 and the BCV E3 gene leads us to propose a model 
for the relation between several coronaviruses in the 
antigenic cluster of MHV. Human coronavirus OC43 
is closely related to BCV (Lapps and Brian, 1985) and 
shows sequence similarity to MHV-A59 (Hogue et a/., 
1984; Weiss, 1983). Since OC43 and influenza C virus 
cause a similar infection in humans (McIntosh et al., 
1969; Katagiri etal., 1983) OC43 could have acquired 
its hemagglutinin gene in a mixed infection. More likely, 
coinfection of another coronavirus with influenza C vi¬ 
rus followed by recombination gave rise to the new co¬ 
ronavirus OC43. The hemagglutinin gene of OC43 and 
BCV would then be the evolutionary intermediate be¬ 
tween influenza C virus HA and MHV ORF2 (see Fig. 
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6). This model is supported by recent experiments per¬ 
formed in cooperation with Drs. R. Vlasak and P. 
Palese (Vlasak et al., 1988) which show that BCV and 
OC43 recognize the same receptor and possess the 
same esterase activity as has been reported for the in¬ 
fluenza C virus hemagglutinin protein (Vlasak et at., 
1987). 

It has been suggested that virus evolution is a modu¬ 
lar event, in which viral genomes are the result of the 
assembly of a set of primitive genes (see Goldbach, 
1987). This mechanism can offer an alternative expla¬ 
nation for the relation between MHV and influenza C 
virus. However, the similarity with MHC RNA and the 
previously reported extra stretch of nucleotides in the 
A59 peplomer gene (Luytjes et at., 1987) indicate that 
coronaviruses are probably capable of nonhomolo- 
gous recombination during replication. To date nonho- 
mologous recombination at the RNA level in animal 
RNA viruses has been reported only for defective inter¬ 
fering RNA (see King et at., 1987). Coronaviruses are 
the first example of nontumor RNA viruses being able 
to take up directly into their genome genetic material 
from the host cell. This may be a strong force in gener¬ 
ating strains with new host spectra and tissue tropisms 
and could have important implications for the preven¬ 
tion of coronavirus infections. 
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